Search CORE

5 research outputs found

Delays in Reinforcement Learning

Author: Liotet Pierre
Publication venue
Publication date: 20/09/2023
Field of study

Delays are inherent to most dynamical systems. Besides shifting the process in time, they can significantly affect their performance. For this reason, it is usually valuable to study the delay and account for it. Because they are dynamical systems, it is of no surprise that sequential decision-making problems such as Markov decision processes (MDP) can also be affected by delays. These processes are the foundational framework of reinforcement learning (RL), a paradigm whose goal is to create artificial agents capable of learning to maximise their utility by interacting with their environment. RL has achieved strong, sometimes astonishing, empirical results, but delays are seldom explicitly accounted for. The understanding of the impact of delay on the MDP is limited. In this dissertation, we propose to study the delay in the agent's observation of the state of the environment or in the execution of the agent's actions. We will repeatedly change our point of view on the problem to reveal some of its structure and peculiarities. A wide spectrum of delays will be considered, and potential solutions will be presented. This dissertation also aims to draw links between celebrated frameworks of the RL literature and the one of delays

arXiv.org e-Print Archive

Delayed Reinforcement Learning by Imitation

Author: Bisi Lorenzo
Liotet Pierre
Maran Davide
Restelli Marcello
Publication venue
Publication date: 01/01/2022
Field of study

When the agent's observations or interactions are delayed, classic reinforcement learning tools usually fail. In this paper, we propose a simple yet new and efficient solution to this problem. We assume that, in the undelayed environment, an efficient policy is known or can be easily learned, but the task may suffer from delays in practice and we thus want to take them into account. We present a novel algorithm, Delayed Imitation with Dataset Aggregation (DIDA), which builds upon imitation learning methods to learn how to act in a delayed environment from undelayed demonstrations. We provide a theoretical analysis of the approach that will guide the practical design of DIDA. These results are also of general interest in the delayed reinforcement learning literature by providing bounds on the performance between delayed and undelayed tasks, under smoothness conditions. We show empirically that DIDA obtains high performances with a remarkable sample efficiency on a variety of tasks, including robotic locomotion, classic control, and trading

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

Author: Liotet Pierre
Metelli Alberto Maria
Restelli Marcello
Vidaich Francesco
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 13/12/2021
Field of study

Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for current reinforcement learning algorithms. Yet this would be a much needed feature for practical applications. In this paper, we propose an approach which learns a hyper-policy, whose input is time, that outputs the parameters of the policy to be queried at that time. This hyper-policy is trained to maximize the estimated future performance, efficiently reusing past data by means of importance sampling, at the cost of introducing a controlled bias. We combine the future performance estimate with the past performance to mitigate catastrophic forgetting. To avoid overfitting the collected data, we derive a differentiable variance bound that we embed as a penalization term. Finally, we empirically validate our approach, in comparison with state-of-the-art algorithms, on realistic environments, including water resource management and trading

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Association for the Advancement of Artificial Intelligence: AAAI Publications

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

Author: Liotet Pierre
Metelli Alberto Maria
Restelli Marcello
Vidaich Francesco
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/01/2022
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Learning FX trading strategies with FQI and persistent actions

Author: Antonio Riva
Edoardo Vittori
Lorenzo Bisi
Luca Sabbioni
Marcello Restelli
Marco Pinciroli
Michele Trapletti
Pierre Liotet
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano